Goto

Collaborating Authors

 normalization technique



RegBN: Batch Normalization of Multimodal Data with Regularization

Neural Information Processing Systems

However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low-or high-level features extracted from data modalities before their fusion takes place.






Reviews: Root Mean Square Layer Normalization

Neural Information Processing Systems

ORIGINALITY: The proposed normalization technique is original in the sense that the main difference in existing normalization techniques (batch, layer, group, instance..) differ only in the dimensions over which the activations are normalized. This paper proposes removing one of the typical steps in the normalization process in order to speed up training, which has been less well-studied - This work proposes dividing by the RMS statistic instead of standard deviation without hurting accuracy. Other works (for example, Santurkar et al.) experiment with scaling by different statistics, such as various l_p norms, without a loss in training accuracy. This work is not the first to suggest scaling the activations by a different statistic QUALITY: The authors tested their technique on multiple deep learning frameworks (TensorFlow, PyTorch, Theano), which gives more support to their empirical results, as different implementations can have very different timing results The authors tested their technique on multiple tasks and neural network architectures - The main hypothesis hypothesis is that the re-centering step in Layer Normalization is dispensable, and this is backed only by experimental results and could be a lot stronger with some theoretical justification - While the few experimental results show that there is no degradation of accuracy from not centering the activations, I am still not fully convinced that the centering step can be deemed unnecessary. For example, it is likely that the weights/biases of the networks in the paper are initialized such that the activations are roughly centered around zero already, and therefore the mean-centering step can be removed without seeing much of a difference in performance.


Fully Open Source Moxin-7B Technical Report

Zhao, Pu, Shen, Xuan, Kong, Zhenglun, Shen, Yixin, Chang, Sung-En, Rupprecht, Timothy, Lu, Lei, Nan, Enfu, Yang, Changdi, He, Yumei, Xu, Xingchen, Huang, Yu, Wang, Wei, Chen, Yue, He, Yong, Wang, Yanzhi

arXiv.org Artificial Intelligence

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA and Mistral, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, and some use restrictive licenses whilst claiming to be "open-source," which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed in accordance with the Model Openness Framework (MOF), a ranked classification system that evaluates AI models based on model completeness and openness, adhering to principles of open science, open source, open data, and open access. Our model achieves the highest MOF classification level of "open science" through the comprehensive release of pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our model achieves superior performance in zero-shot evaluation compared with popular 7B models and performs competitively in few-shot evaluation.


Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization

Yun, Juyoung

arXiv.org Artificial Intelligence

In deep learning, Residual Networks (ResNets) have proven effective in addressing the vanishing gradient problem, allowing for the successful training of very deep networks. However, skip connections in ResNets can lead to gradient overlap, where gradients from both the learned transformation and the skip connection combine, potentially resulting in overestimated gradients. This overestimation can cause inefficiencies in optimization, as some updates may overshoot optimal regions, affecting weight updates. To address this, we examine Z-score Normalization (ZNorm) as a technique to manage gradient overlap. ZNorm adjusts the gradient scale, standardizing gradients across layers and reducing the negative impact of overlapping gradients. Our experiments demonstrate that ZNorm improves training process, especially in non-convex optimization scenarios common in deep learning, where finding optimal solutions is challenging. These findings suggest that ZNorm can affect the gradient flow, enhancing performance in large-scale data processing where accuracy is critical.


Precision Cancer Classification and Biomarker Identification from mRNA Gene Expression via Dimensionality Reduction and Explainable AI

Tabassum, Farzana, Islam, Sabrina, Rizwan, Siana, Sobhan, Masrur, Ahmed, Tasnim, Ahmed, Sabbir, Chowdhury, Tareque Mohmud

arXiv.org Artificial Intelligence

Gene expression analysis is a critical method for cancer classification, enabling precise diagnoses through the identification of unique molecular signatures associated with various tumors. Identifying cancer-specific genes from gene expression values enables a more tailored and personalized treatment approach. However, the high dimensionality of mRNA gene expression data poses challenges for analysis and data extraction. This research presents a comprehensive pipeline designed to accurately identify 33 distinct cancer types and their corresponding gene sets. It incorporates a combination of normalization and feature selection techniques to reduce dataset dimensionality effectively while ensuring high performance. Notably, our pipeline successfully identifies a substantial number of cancer-specific genes using a reduced feature set of just 500, in contrast to using the full dataset comprising 19,238 features. By employing an ensemble approach that combines three top-performing classifiers, a classification accuracy of 96.61% was achieved. Furthermore, we leverage Explainable AI to elucidate the biological significance of the identified cancer-specific genes, employing Differential Gene Expression (DGE) analysis.